109 results found.
Speech/Written
Corpus,
Language Type:
Multilingual
Languages:
Dutch English French German Italian Polish Portuguese Spanish
Availability:
Freely Available
License:
CC BY 4.0
Size:
None Production Status:
Existing-used
Use:
Information Extraction, Information Retrieval
-
Paper title:LeBenchmark: A Reproducible Framework for Assessing Self-Supervised Representation Learning from Speech
-
Paper track:8.1 Feature extraction and low-level feature model/Oral Presentation
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Laurent Besacier | Multilingual LibriSpeech (MLS) | /N |
Documentation:
https://arxiv.org/abs/2012.03411, English, public
Speech/Written
Corpus,
Language Type:
Multilingual
Languages:
Catalan Chinese English Esperanto French German Italian Kabyle Kinyarwanda Persian Polish Russian Spanish Welsh
Availability:
Freely Available
License:
Creative Commons license
Size:
8.8k hoursProduction Status:
Existing-used
Use:
Speech Recognition/Understanding
-
Paper title:LeBenchmark: A Reproducible Framework for Assessing Self-Supervised Representation Learning from Speech
-
Paper track:8.1 Feature extraction and low-level feature model/Oral Presentation
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Laurent Besacier | Common Voice | /N |
Documentation:
https://arxiv.org/pdf/1912.06670.pdf, English, public
Written
Corpus,
Language Type:
Multilingual
Languages:
Afrikaans Albanian Amharic Arabic Aragonese Armenian Assamese Azerbaijani Basque Belarusian Bengali Bosnian Breton Bulgarian Burmese Catalan Central Khmer Chinese Croatian Czech Danish Dutch Dzongkha English Esperanto Estonian Finnish French Gaelic Galician Georgian German Greek Gujarati Hausa Hebrew Hindi Hungarian Icelandic Igbo Indonesian Irish Italian Japanese Kannada Kazakh Kinyarwanda Korean Kurdish Kyrgyz Latvian Limburgan Lithuanian Macedonian Malagasy Malay Malayalam Maltese Marathi Mongolian Nepali Northern Sami Norwegian Norwegian Bokmål Norwegian Nynorsk Occitan Oriya Panjabi Pashto Persian Polish Portuguese Romanian Russian Serbian Serbo-Croatian Sinhala Slovak Slovenian Spanish Swedish Tajik Tamil Tatar Telugu Thai Turkish Turkmen Uighur Ukrainian Urdu Uzbek Vietnamese Walloon Welsh Western Frisian Xhosa Yiddish Yoruba Zulu
Availability:
Freely Available
License:
Size:
55 million sentences Production Status:
Existing-used
Use:
Machine Translation, SpeechToSpeech Translation
-
Paper title:Improving Massively Multilingual Neural Machine Translation and Zero-Shot Translation
-
Paper track:Long/Machine Translation
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Biao Zhang | the open parallel corpus (OPUS) | /N |
Documentation:
None
Not Applicable
Contextualsed word embeddings,
Language Type:
Monolingual
Languages:
Ancient Arabic Basque Bokmål Bulgarian Catalan Chinese Church Croatian Czech Danish Dutch English Estonian Finnish French Galician German Greek Hebrew Hindi Hungarian Indonesian Irish Italian Japanese Korean Latin Latvian Norwegian Nynorsk Old Persian Polish Portuguese Romanian Russian Simplified Chinese Slavonic Slovak Slovene Spanish Swedish Turkish Ukrainian Urdu Uyghur Vietnamese
Availability:
Freely Available
License:
none
Size:
18.4 GByte Production Status:
Existing-used
Use:
Parsing and Tagging
-
Paper title:Treebank Embedding Vectors for Out-of-domain Dependency Parsing
-
Paper track:Short/Syntax: Tagging, Chunking and Parsing
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Joachim Wagner | Elmo For Many Languages | /N |
Documentation:
https://www.aclweb.org/anthology/K18-2005/
Speech
Corpus,
Language Type:
Monolingual
Languages:
Bengali Czech Dari English Hindi Lao Mandarin Chinese Mesopotamian Arabic Moroccan Arabic North Levantine Arabic Panjabi Persian Polish Pushto Russian Slovak South Levantine Arabic Spanish Standard Arabic Tamil Thai Turkish Ukrainian Urdu
Availability:
From Owner
License:
LDC
Size:
204 hours Production Status:
Existing-used
Use:
Language Identification
-
Paper title:Metric learning loss functions to reduce domain mismatch in the x-vector space for language recognition
-
Paper track:4.1 Language identification and verification, lang/Oral Presentation
-
Paper status:Accept - Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Raphaël Duroselle | 2011 NIST Language Recognition Evaluation Test Set | /N |
Documentation:
None
Written
Corpus,
Language Type:
Monolingual
Languages:
Polish
Availability:
Freely Available
License:
CreativeCommons
Size:
61315 tokens Production Status:
Existing-updated
Use:
Information Extraction, Information Retrieval
-
Paper title:PST 2.0 – Corpus of Polish Spatial Texts
-
Paper track:Written/poster presentation
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Michał Marcińczuk | Corpus of Polish Spatial Texts 2.0 (PST 2.0) | /N |
Documentation:
None
Written
Corpus,
Language Type:
Multilingual
Languages:
English Polish Russian
Availability:
Freely Available
License:
Size:
None Production Status:
Existing-used
Use:
Machine Learning
-
Paper title:Neural Text Denormalization for Speech Transcripts
-
Paper track:10.4 Rich transcription/Oral Presentation
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Benjamin Suter | Text Normalization Data (Sprout & Jaitly 2017) | /N |
Documentation:
None
Speech/Written
Corpus,
Language Type:
Multilingual
Languages:
Bulgarian Croatian Czech French German Mandarin Polish Portuguese Spanish Thai Turkish
Availability:
From Data Center(s)
License:
ELRA
Size:
18.7 GByteProduction Status:
Existing-used
Use:
Speech Recognition/Understanding
-
Paper title:Zero-shot Cross-Lingual Phonetic Recognition with External Language Embedding
-
Paper track:8.11 Cross-lingual and multilingual/accent aspects/Poster Presentation
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Heting Gao | GlobalPhone | /N |
Documentation:
None
Written
Corpus,
Language Type:
Multilingual
Languages:
Hebrew Hindi Polish
Availability:
Freely Available
License:
Creative Commons
Size:
277.701 sentences Production Status:
Existing-used
Use:
Extraction of Multiword Expressions
-
Paper title:Verbal Multiword Expression Identification: Do We Need a Sledgehammer to Crack a Nut?
-
Paper track:Long paper/
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Carlos Ramisch | PARSEME corpora | /N |
Documentation:
None
Corpus,
Language Type:
Multilingual
Languages:
German Italian Polish
Availability:
License:
Size:
None Production Status:
Existing-used
Use:
-
Paper title:The CLARIN Knowledge Centre for Atypical Communication Expertise
-
Paper track:Infrastructural Issues/Large Projects/poster presentation
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Henk van den Heuvel | The P-MoLL corpus | /N |
Documentation:
Dittmar, N., Reich, A., Skiba, R., Schumacher, M., & Terborg, H. (1990). Die Erlernung modaler Konzepte des Deutschen durch erwachsene polnische Migranten: Eine empirische Längsschnittstudie. In: Informationen Deutsch als Fremdsprache: Info DaF 17(2), pp. 125-172.




